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1.  Introduction 


The  quantal  response  (QR)  model  seeks  to  characterize  the  value  of  a  binary  re¬ 
sponse  y  6  {0, 1}  as  a  function  of  continuous  stimulus  x.  The  original  formulation 
postulates  an  unobservable  stimulus  limit  L,  which  determines  the  response  via 


(1) 


If  L  is  known  and  constant,  then  y  is  a  step  function  of  x  with  jump  at  x  -  L. 
Otherwise,  suppose  L  is  random  with  cumulative  distribution  function  (CDF) 


(2) 


FL(t)  =Pr[L<f]. 


The  distribution  of  L  determines  the  QR  model  probability  of  response  as 


P(x)  =  Pr[y  =  l|x]=Pr[L<x]  =  FL(x) . 


(3) 


Assuming  a  specific  distribution  family  for  Fi  such  as  logistic,  G{x)  =  (1  +  e  x )  1 , 


or  normal,  G(x )  =  (2 n)  !^2  e  2  dt,  fitting  the  model  amounts  to  estimating 
the  limit  distribution  location  and  scale  parameters  m  and  s  in 


(4) 


The  CDF  of  L  is  interpreted  as  a  functional  model  for  the  probability  of  response. 

The  existence  of  L  is  not  necessary,  and  as  such  the  QR  model  is  a  special  case  of 
the  Generalized  Linear  Model  (GLM).  See  Collins-!-^  for  details.  The  requirement 
of  increasing  P  can  be  relaxed,  and  then  P  is  no  longer  equivalent  to  any  CDF 
(which,  of  course,  must  always  be  increasing).  What  remains  is  a  purely  functional 
model  with  no  implicit  limit  distribution.  Several  such  formulations  follow. 

2.  General  QR  Models 

A  binary  response  y  6  {0, 1}  has  an  expected  value  depending  on  stimulus  x. 


E  [  y  |  x]  =  P{x)  . 


(5) 
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The  conditional  distribution  of  y  is  Bernoulli  with  the  indicated  mean,  so 

Pr [ y  =  1  \x\  =  E[y  \  x]  =  1  -Pr[y  =  0  \  x]  .  (6) 

In  the  usual  case,  response  is  an  increasing  function  of  x,  so  the  response  function 
P  must  be  a  CDF.  A  standard  choice  for  P  uses  the  logistic  CDF  G(x)  =  (1  + 
e~x)~l  with  the  linear  parameterization  P(x)  =  G(a  +  bx)  or  the  location-scale 
parameterization  P{ x)  =  G  ({x  -  m)/s). 

This  is  inadequate,  for  example,  in  the  presence  of  the  shattergap  phenomenon, 
when  the  probability  of  penetration  (y)  is  observed  to  decrease  in  some  velocity 
(x)  range.  An  example  data  set  is  taken  from  Chang  and  Bodtr  described  therein 
as  “Results  of  69  Ballistic  Shots  on  Phase  II  Al2C>3/Kevlar  Armor  Plates”,  although 
there  are  only  68  data  points  in  the  report. 

Chang  and  Bodt  also  develop  a  specific  parametric  model  that  must  be  analyzed 
from  first  principles  (not  GLM).  This  is  presented  in  Section  3. 

The  remaining  approaches  are  all  based  on  GLM.  Standard  Bernoulli  GLM  can 
estimate  an  arbitrarily  complex  response,  not  necessarily  increasing,  as  in  Section  4. 
Nonparametric  estimation  of  the  response  is  accomplished  with  penalized  B-spline 
models,  as  in  Section  5.1,  or  smoothing  spline  models,  as  in  Section  5.2. 

3.  The  Chang-Bodt  QR  Model 

The  Chang-Bodt-  model  is 


P{x)  =  (1  -  Pz(x,mz,sz ))  •  P\(x,m\,s\)  +  Pz(x,mz,sz )  •  P2(x,m2,s2)  ,  (7) 

where  the  P,  are  location-scale  CDFs.  Pi  is  the  (monotonic)  probability  of  penetra¬ 
tion  for  an  unshattered  threat,  and  P2  is  the  probability  of  penetration  for  a  shattered 
threat.  Pz  is  the  probability  of  shatter.  For  low  x,  shatter  is  unlikely,  Pz  ~  0,  and 
P  ~  P\.  For  high  x,  shatter  is  likely,  Pz  ~  1,  and  P  ~  P2.  In  the  intermediate  x 
range,  the  mixture  is  weighted  according  to  the  increasing  shatter  probability  Pz. 
For  convenience,  let  R  =  1  -  P. 
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The  likelihood  for  a  single  observation  ( xi,yi )  is 


U  =  Rz(xi)  •  Pi (*/)* tfiUi)1-*'  +  0  •  P2(xi)yi R2{xi)l-yi ,  (8) 


where  the  parameters  of  Pk  are  (ra^s^). 


^zUf)  •  +  P-O)  •  Ro^xi)  ,  yi  =  0 


PzOo)  '  P\(xi)  +  Pz(xi)  ■  P2(xi)  ,  y{  =  1 


(9) 


For  the  logistic  CDF,  we  have  G(x )  =  (1  +  exp(-x))  1 .  The  usual  location-scale 


parameterization  is  P{x)  =  G((x  -  m)/s). 

Parameter  estimates  can  be  obtained  by  numerical  optimization  of  the  negative  log- 
likelihood  A  =  -  £  l°g  Li-  The  parameters  reported  by  Chang  and  Bodt  are  not 
optimal  estimates.  Both  sets  of  parameter  estimates  are  given  in  Table  1. 


Table  1  Chang-Bodt  model  parameters 


Model 

mi 

*1 

m2 

«2 

mz 

sz 

A 

Reported 

1650 

100 

2550 

100 

2050 

100 

22.58 

Estimated 

1646.97 

90.49 

2529.34 

75.75 

2020.15 

87.80 

21.91 

See  Fig.  1  for  the  response  curves. 
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Fig.  1  Chang-Bodt  model 
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4.  Parametric  QR  Models 


GLM  with  the  Bernoulli  response  provides  a  way  to  characterize  a  possibly  nonin¬ 
creasing  function  as  P{x)  =  G  ( f{x )),  where  G  is  a  monotone  function  such  as  the 
standard  logistic,  normal,  or  Cauchy  CDF.  Finite-dimensional  parametric  models 
are  obtained  by  using  some  basis  (/i,/2,.  •  ■),  and  then  f(x)  =  X' ft  for  fixed  k 
where  [i  =  (/3U. . .  ,/3k)  and  X  =  (/i(.r),. . .  ,/*(. *)).  So,  we  get 

P{x)  =  G(Xr/3) .  (10) 

This  accounts  for  the  (monotonic)  basic  linear  or  location-scale  model 

(X  —  Tfl  \ 

)  (li) 

and  polynomials  of  arbitrary  degree 

p(x)  =  G(z^)  •  (12) 

The  canonical  polynomial  basis  is  given  by  fi(x)  =  x' .  In  practice,  we  use  an  or¬ 
thogonal  polynomial  basis,  with  degr ee(/;)  =  i  and  j  =  0  if  i  ±  j.  This 
eliminates  numerical  problems  and  provides  the  same  solution  as  the  canonical  ba¬ 
sis  for  each  k.  Other  popular  choices  for  basis  sets  include  the  natural  spline  and 
B-spline.  See  Fig.  2  for  basis  set  examples  with  dimension  k  =  5  where  various 
colors  distinguish  the  basis  elements.  Figure  3  shows  logistic  response  estimates 
for  these  3  basis  sets  with  dimension  d  =  3,4, 5, 6. 
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Fig.  2  Regression  basis  sets 
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Fig.  3  Logistic  regression 
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5.  Nonparametric  QR  Models 


Nonparametric  linear  models  are  presented  in  Appendix  A.  These  arise  from  a  pro¬ 
cess  that  can  be  described  as  penalized  maximum  likelihood  estimation,  penalized 
least  squares,  or  smoothing. 

In  Appendix  B,  the  iteratively  reweighted  least  squares  (IRLS)  GLM  estimation 
procedure  for  QR  models  is  adapted  to  use  nonparametric  penalized  linear  models. 
This  gives  rise  to  nonparametric  penalized  QR  models. 

5.1  The  P-spline  QR  Model 

Eilers  and  Marx-  propose  using  an  overfitted  B-spline  model  and  then  penalizing 
to  produce  a  smooth  fit.  This  is  called  a  P-spline  model.  In  this  application,  we  fit 
a  penalized  GLM  with  a  degree  3  B-spline  basis  of  size  p  =  32  penalized  with  the 
second  derivative  D.  The  smoothing  operator  is  S  =  eA  D. 

Selection  of  the  smoothing  parameter  is  usually  accomplished  by  optimizing  some 
information  or  cross-validation  quantity  such  as  the  Akaike  information  criterion 
(AIC),  ordinary  cross-validation  (OVC),  or  generalized  cross-validation  (GCV)  as 
described  in  Appendix  A.  There  are  other  such  quantities  and  other  methods,  and  no 
single  procedure  is  known  to  give  the  best  solution.  In  fact,  no  single  procedure  even 
works  for  all  data  sets.  So  the  choice  of  smoothing  parameter  selection  procedure 
is  itself  somewhat  subjective.  To  compensate  for  the  perception  that  the  procedures 
tend  to  oversmooth  the  response,  an  adjusted  smoothing  parameter  is  computed  as 
the  minimum  of  the  3  values  obtained  less  their  range.  For  this  data,  Ta;c  =  -6.2, 
A gcv  —  —6.0,  docv  —  —5.7,  and  A adj  —  6.7. 

See  Fig.  4  for  fits  with  various  values  of  the  smoothing  parameter.  Optimal  solutions 
for  the  3  smoothing  selection  methods  along  with  the  adjusted  solution  are  shown 
in  Fig.  5.  The  adjusted  solution  and  confidence  intervals  are  shown  in  Fig.  6. 
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Fig.  4  P-spline  optimization 
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Fig.  5  Optimal  and  adjusted  P-spline  solutions 
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Fig.  6  Adjusted  P-spline  solution  with  90%  confidence  intervals 
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5.2  The  S-spline  QR  Model 


The  smoothing  spline,  or  S-spline,  QR  model  has  the  form  P(x )  =  G(  f(x)),  where 
/  is  an  S-spline  linear  model  with  knots  at  the  stimulus  data  points,  and  G  is  an 
arbitrary  link  function.  Wahba-  is  the  standard  reference  for  smoothing  splines. 

Smoothing  splines  are  obtained  through  optimization  in  certain  function  spaces, 
and  the  GLM  implementation  described  in  Section  B.3  accomplishes  this  using 
the  standard  IRLS  GLM  algorithm.  In  this  application,  we  fit  a  cubic  smoothing 
spline  GLM  by  penalizing  with  the  second  derivative  D.  The  smoothing  operator  is 
S  =  e'lD. 


We  allow  for  multiple  observations  at  a  single  stimulus  by  averaging  the  response 
and  multiplying  the  weight  by  the  observation  multiplicity  at  that  level.  However, 
the  sum  of  squared  errors  (SSE)  and  A  are  computed  from  the  original  data,  so  they 
are  comparable  with  the  other  models. 

Smoothing  parameter  selection  methodology  is  the  same  as  for  the  P-spline,  Sec¬ 
tion  5.1.  Typically,  different  A  values  are  obtained. 

See  Fig.  7  for  fits  with  various  values  of  the  smoothing  parameter.  Optimal  solutions 
for  all  3  smoothing  selection  methods  and  the  adjusted  solution  are  shown  in  Fig.  8. 
The  adjusted  solution  and  confidence  intervals  are  shown  in  Fig.  9. 
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Fig.  8  Optimal  S-spline  solutions 
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Fig.  9  Adjusted  S-spline  solution  with  90%  confidence  intervals 
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6.  Model  Comparison 


Table  2  displays  fit  statistics  and  Fig.  10  shows  the  superimposed  solutions  for  se¬ 
lected  methods:  degree  6  parametric  models,  reported  and  estimated  Chang-Bodt 
(CB),  and  adjusted  P-spline  and  S-spline.  The  adjusted  smoothing  spline  solution 
is  the  best  fit  in  terms  of  squared  error  and  maximum  likelihood  measures. 

Table  2  Goodness  of  fit  comparison 


Model 

SSE 

A 

Polynomial  (6) 

7.237 

22.49 

B-spline  (6) 

7.212 

22.34 

Natural  spline  (6) 

7.201 

22.07 

CB  reported 

7.310 

22.58 

CB  estimated 

7.186 

21.91 

P-spline  adjusted 

7.045 

21.98 

S-spline  adjusted 

6.847 

21.37 
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Fig.  10  All  solutions 
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7.  Conclusions  and  Recommendations 


The  simple  location- scale  quantal  response  model  may  not  adequately  capture  the 
behavior  of  observed  phenomena. 

Higher-order  polynomial  and  finite-dimensional  spline  basis  models  allow  for  more 
complicated  responses  as  the  polynomial  degree  or  spline  basis  dimension  increases. 

Penalized  B-spline  (P-spline)  and  smoothing  spline  (S-spline)  models  offer  the  most 
flexibility  as  these  are  nonparametric  (not  constrained  to  any  particular  functional 
form).  These  should  be  useful  in  identifying  nonstandard  behavior  via  statistical 
goodness-of-fit  tests. 
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A.1  The  Basic  Linear  Model 


The  usual  linear  model  is 

Y  =  X/3  +  s ,  (A-l) 

where  Y  is  an  n x  1  response,  X  i s  an  n x p  independent  matrix,  [3  is  a  px  1  parameter, 
and  the  n  x  1  error  e  ~  N(0,cr2In )  is  normally  distributed. 

For  any  positive-definite  symmetric  matrix  W,  the  corresponding  weighted  inner 
product  and  norm  are,  respectively,  (x,y)w  =  xfWy  and  \\x\\w  =  (x,x }^2.  So, 
in  general,  the  density  of  a  normal  vector  with  mean  M  and  variance-covariance 
matrix  V  can  be  written  as 

fix)  =  {2nTnl2\V\-xl2exV[-\\\x-M\\2v_x]  .  (A-2) 

For  the  model  of  Eq.  A-l,  we  have  E Y  =  M  =  X/3  and  Vary  =  V  =  cr2/„.  Each 
column  of  A  is  a  linear  predictor,  and  the  model  is 

yi  =  Yj  X‘jPi  ■  (A-3) 

7=1 

We  can  work  with  a  p-parameter  model  for  a  single  predictor  v  by  choosing  a  set  of 
fixed  basis  functions  {/i, . . .  ,fp)  and  setting  Xjj  =  fj(vj). 

p 

yi  =  J]Pjfj(vi)-  (A-4) 

7=1 


For  example,  the  choice  of  fj(v)  =  v-'  1  gives  the  polynomial  model 


y  =  f30  +  f3  iv  +  (32v 2  +  •  •  •  (3p-\vp  1  .  (A-5) 

See  Eq.  A-2.  Solution  by  least  squares  is  equivalent  to  maximum  likelihood  esti¬ 
mation  for  normal  error,  and  the  criterion  is  to  choose  u  that  minimizes  Q  =  sre  = 
||£||2  =  || Y  -  X/3\\2  since  s  =  Y-X/3.  This  is 

Q  =  \\Y\\2-2/3,X,Y  +  \\X/3\\2,  (A-6) 
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and  by  setting  the  derivative  to  0, 


—  =  -2  X’Y  +  2X'X  (3  =  0,  (A- 7) 

d(3 

we  obtain  the  normal  equations 


ii 

(A- 8) 

with  solution 

p  =  {x’xy'x'Y 

(A-9) 

and  response  estimate 

Y  =  HY , 

(A- 10) 

where  the  so-called  hat  matrix  is 

H  =  x^xy'x1 . 

(A- 11) 

Note  that  E  ft  =  (3  and  Var  (3 

=  cr2(xtxy1. 

A.2  The  Weighted  Model 

When  the  error  is  N{ 0,2),  the  correct  inner  product  is  weighted  by  the  symmetric 
W  =  5T1,  and  so  Q  =  s'We  =  \\s\\^  =  \\Y  -  Xp\\^.  This  is 

Q  = 

\\Y\\2w-2ptXtWY+\\Xp\\2w. 

(A- 12) 

Then 

dQ 

d/3 

=  -2X'WY  +  2X'WX(3  =  0, 

(A- 13) 

the  normal  equations  are 

X'WXp  =  XrWY, 

(A- 14) 

the  solution  is 

[3  =  (X'WXy1  X'WY  , 

(A-15) 

and  the  response  estimate  is  Y  =  HY,  where 

h  =  x(xtwxy1xtw . 

(A- 16) 

Note  that  E  (3  =  (3  and  Var  p  =  (. X'WXy \ 
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For  these  models,  the  likelihood  function  is 


L(fi)  =  (2tt) -w/2 | S r 1/2  exp [-  J 1 1 K  -  ^>0 1 1|_,  ]  ,  (A- 17) 

its  log  derivative  is 

log  L(P)  =  X'I.-l(Y-X/3),  (A- 18) 

d(3 

and  the  information  matrix  is 

Mr  =  A'S_1X  .  (A- 19) 


A.3  Smoothing 

If  the  system  is  ill-conditioned,  we  can  maximize  a  penalized  likelihood  function 

Ls(/3)  =  L(t 3)  •  exp[-±||S/3||2]  .  (A-20) 

Equivalently,  penalization  can  be  applied  to  the  least-squares  formulation  by  mini¬ 
mizing  Q  =  ||e||^  +  ]|5’/?||2  for  some  smoothing  operator  S.  This  is 

Q  =  \\Y\\2W  -  IpX'WY  +  \\X(3\\l  +  \\p\\2s,s  .  (A-21) 

Then 

^  =  -2XrWY +  2X’WX/3 +  2S,S(3  =  0,  (A-22) 

d(5 

and  the  normal  equations  are 

(X'WX  +  SlS)p  =  X'WY  (A-23) 

with  solution 

p  =  (X‘WX  +  S*S)~1XtWY  (A-24) 

and  response  estimate  Y  =  HY  where 

H  =  X(XrWX  +  Sf5)_1Arl¥  .  (A-25) 

This  is  equivalent  to  the  model  Y*  =  X*  f3  +  s*  with  weight  W*  where  Y*  =  [  q  j , 
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[  ^  j ,  and  W*  =  [  ^  ®  ] .  Then  the  normal  equations  are 


X*  = 


[xr  s'] 


W  0 
0  / 


[x'  s'] 


X 

s 

wx 


X 

S'] 

W  0 

0  I 

X 

S'\ 

WY 

0 

(X'WX  +  S'S)/3  =  X'WY 


(A-26) 


as  required.  This  so-called  augmented  representation  is  useful  for  computation. 


Modem  software  for  linear  least-squares  estimation  operates  on  Eqs.  A-8,  A-14, 
and  A-26  through  the  response  vector  Y,  the  weight  vector  W,  the  smoothing  vector 
S,  and  the  design  matrix  X.  The  normal  equations  are  solved  efficiently  without 
inverting  the  design  matrix,  and  we  get  parameter  estimates  and  diagnostics  such  as 
the  parameter  variance  and  hat  matrix  diagonal. 


A.4  Nonparametric  P-spline 

Smoothing  or  penalization  can  be  applied  to  an  overfitted  parametric  model  as  given 
by  Eq.  A-4.  Suppose  /  has  the  particular  form 

p 

f(x)  =  £  PjfjiM)  (A-27) 

7=1 


for  a  B-spline  basis  (/i,. . .  ,fp)  and  D  is  a  linear  differential  operator.  Computa¬ 
tional  details  for  the  B-spline  are  in  Appendix  C,  and  for  the  differential  operator 
see  Appendix  D.  The  solution  is  the  minimizer  ( fi\,. . .  ,/3p)  of 


+  A2  fb(D(Yl3jfj)<u)f<lu  (A-28) 

/=!  7=1  Ja  7=1 


and  is  again  given  exactly  by  Eq.  A-21.  This  works  because  of  the  ordered  par¬ 
tial  partition  property  of  B-spline  bases:  smoothing  the  coefficient  vector  in  fact 
smooths  the  solution.  Eilers  and  Marx1  call  this  a  P-spline  (penalized  B-spline) 
model.  Figure  A-l  depicts  the  effect  of  smoothing  parameter  variation  for  the  P- 
spline  model. 

'Eilers  PHC,  Marx  BD.  Flexible  smoothing  with  B-splines  and  penalties.  Statistical  Science. 
1996;1 1(2):89— 121 . 
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Fig.  A-l  P-spline  optimization,  linear  model 
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A.5  Nonparametric  S-spline 

The  smoothing  spline  is  another  manifestation  of  penalized  maximum  likelihood  or 
least  squares  estimation.  A  smoothing  spline,  or  S-spline,  /  is  the  minimizer  of 


(A-29) 


where  /  is  in  some  suitable  function  space,  and  T)  is  a  linear  differential  operator. 
The  solution  is  a  spline  with  knots  at  the  data  points.  If  D  has  order  p,  the  solution 
is  piecewise  polynomial  of  degree  2p  -  1  with  continuous  derivatives  of  orders 
0, 1 , 2, . . . ,  2p  -  2.  See  WahbaT 

The  discrete  representation  of  this  problem  is  Eq.  A-21,  with  X  =  I  and  S  taken  to 
be  a  scalar  multiple  of  a  differential  operator  D,  so  S  =  AD  and  S' S  =  A2 Dr D.  The 
representation  D  of  D  is  given  in  Appendix  D.  The  discrete  formulation  provides 
the  exact  solution  of  Eq.  A-29. 

When  D  is  the  second  derivative,  the  solution  is  a  cubic  spline.  The  solution  is  given 
at  the  data  points,  and  spline  interpolation  can  be  used  to  evaluate  the  response  at 
other  values. 

Figure  A-2  depicts  the  effect  of  smoothing  parameter  variation  for  the  S-spline 
model. 


2Wahba  G.  Spline  models  for  observational  data.  Philadelphia  (PA):  Society  for  Industrial  and 
Applied  Mathematics;  1990. 
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Fig.  A -2  S-spline  optimization,  linear  model 
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A.6  Selecting  the  Smoothing  Parameter 


Choice  of  smoothing  parameter  A  is  usually  based  on  minimization  of  one  of  the 
following  quantities.  The  ordinary  cross-validation  (OCV)  is 


OCV  =  - 
n 


v  ( ~  y«- 


The  generalized  cross-validation  (GCV)  is 


GCV  = 


T,rU(yi-yi)2 

[iTr(/-H)J2 


The  Akaike  information  criterion  (AIC)  is 


(A-30) 


(A-31) 


AIC  =  A  +  2-Tr  H , 


(A-32) 


where  the  deviance  A  =  -21og(Lreduced/Cfuii)  is  defined  in  terms  of  the  likelihood 
function  L. 

For  normal  error,  Lf„ii  =  1,  and  based  on  Eq.  A-2,  we  have 

VL  11 

log  ^reduced  —  log(27r)  —  —  log  |X|  —  —  1 1 C | |^_i  , 

where  the  residuals  are  s  =  Y  -  X  (3  and  the  error  variance  is  X. 

For  independent  and  identically  distributed  (IID)  errors,  X  =  cr2In  and  X 
And  so 

n  n  9  1  . 

log  T reduced  =  log(2^)  -  -  log^)  -  £  . 

We  replace  cr 2  by  the  estimator  and  get 

,  ,  n ,  n  u  s<£\  ,1 

log  Treduced  =  -- log(2^)  -  -  log(  — J  ~^  =  --[log[2n  —  )  +  1J 

so  t 

A  =  /7[log(2;r— )  +  lj  .  (A-36) 

See  Fig.  A- 3  for  graphs  of  OCV,  GCV,  and  AIC  as  functions  of  A.  Note  that  AIC 
optimization  fails  for  the  S-spline,  as  no  minimum  value  is  obtained.  See  Fig.  A-4 
for  GVC-optimal  P-spline  and  S-spline  solutions.  With  a  maximum  y -difference  of 
less  than  0.01,  the  solutions  are  practically  indistinguishable. 
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=  (o-2)". 
(A-34) 

(A-35) 
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Fig.  A-4  GCV-optimal  fits,  linear  model 


Approved  for  public  release;  distribution  is  unlimited. 


31 


Intentionally  left  blank. 
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Appendix  B.  The  Generalized  Linear  Model 
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B.1  Generalized  Linear  Model  Formulation 


In  the  Generalized  Linear  Model  (GLM),  the  response  Y  has  an  arbitrary  distribu¬ 
tion,  T]  =  X  (5  is  a  linear  function  of  the  parameter  (3 ,  and  the  mean  response  is 
modeled  as 

fi  =  E[Y\X]  =  G(r1)  (B-l) 

for  some  monotone  link  function  G  with  derivative  g  =  G' .  (Some  authors  call  G-1 
the  link.)  Response  distributions  are  taken  to  be  from  a  single -parameter  exponential 
family,  with  the  form 


=  exp 


yd  -  b{6) 


+  c(y,tfr) 


The  parameter  0  is  to  be  estimated,  and  f  is  a  nuisance  parameter. 


(B-2) 


With  £  =  log  /,  we  calculate  the  moments  of  Y  in  terms  of  exponential  family 
components. 


E 


—£(y,e,i//) 

dO 


=  0 


(B-3) 


because  f  f(y,8,if/)  dy  =  1 ,  and  under  suitable  regularity  conditions  f  f(y,  Q,\f/)  dy 
=  I  lef(y^^)dy  =  I  ^(y,0,iA)-/(;y,0,  tfr)dy  =  0.  Therefore,  E[(y  -  b'(6)) /a(<J/)\ 
=  0  and 

Em  =  b  =  G(1 7)  =  b\8) .  (B-4) 


Also,  as  usual, 


7  d  \2' 

dr 

E 

(' Te eiyA'")) 

=  -E 

dPew>,*) 

because  Var  [^f]  =  E[(if)2]  =  /(^f)2  •  f  dy  =  /  if  •  dy 

=  &-f\-~-Jp-fdy  =  -E[p]  ■ 


Therefore,  E[(y  -  jj)2  /  a(i]j)2]  =  -  E[-b"  (8)  /  and 


Var[y]  =  v(n)a(fi)  =  b"(8)a(i//)  (B-6) 

where  v(n )  =  Var[T]/a(i/f)  =  b"(8)  =  g(r 7)^. 

In  the  case  that  rj  =  8,  and  hence  /u  =  G{tj)  =  G(8),  we  say  that  G  is  the  canonical 
link  function.  Then  /u  =  b\8 )  =  G(8)  =  G(ij)  and  v(pi)  =  b"{8 )  =  g(8)  =  g(r]). 
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B.2  Estimation 


Let  [jci,  . . .  ,xn ]  =  X',  so  the  (column)  vector  xi  is  row  i  of  X,  r/i  -  x^/3,  and 
9j  =  O(rji).  Maximum  likelihood  estimation  for  the  GLM  is  accomplished  by  max¬ 
imizing  the  log-likelihood  function 


i=  1 


i= t  l 


yidj  -  b(0j ) 


+  c(y/,iA) 


(B-7) 


This  is  a  weighted  least  squares  problem  where  the  design  and  weight  depend  on 
the  unknown  parameter,  and  it  can  be  solved  iteratively  by  the  Newton-Raphson 
method. 


B.2.1  The  Newton-Raphson  Method 

In  1  dimension,  we  find  a  zero  of  F  by  linearizing  and  updating  the  current  argument 
x0  to  .x  by  solving  F(xa)  +  (x  -  x0)F'(x0 )  =  0  to  get  x  =  xa  -  F(x0)/F'(x0).  We 
optimize  F  by  setting  F'  =  0,  so  the  update  is  x  =  xa  -  F'(x0)/F"(x0). 

The  vector  version  is  x  =  xa  -  [j^rF(x0)Yl  -^F(x0).  If  we  take  x  =  xa  +  8, 
the  increment  8  -  x  -  x0  satisfies  [ j^F(x0)]S  =  F(xa ).  So  we  need  some 
derivatives. 


B.2. 2  Gradient 

Differentiating,  we  have  the  gradient  (vector  of  first  derivatives) 

£>(/?)  =  ^  =  «(«A)_1  •  J]  [>’'  "  b' (0«)]  '  •  (B-8) 

“  i=i  “ 


Since  jpbXQ,) 


v(^)§  and  -£W)  4,G(m)  =  £G(x</3) 


dm 


d/ 3 


dp 


dp  ~  dp' 


dp' 


g(rji)xt,  we  get 


So  the  gradient  is 


dOj  _  gim ) 
d/3  v(idi)Xl' 


(B-9) 


£>(/?)  =  a( t/0  1  •  ^  ( yt  -  Hi)  ■ 

i=  1 


=  a^y1  ■  X'WFUF  , 

v(/h) 


(B-10) 
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where, forz  =  1, the  diagonal  weight  matrix  Wp  has  elements  wpi  =  g(.r/i)2/v(ni) 


Wf  =  diag 


gO 7i)2 

-  v(yUl) 


g(Un)2' 

v{Hn)  - 


(B-ll) 


and  the  centered/scaled  response  vector  Uf  has  elements  upi  =  (  Vi  -  Hi)/g(di) 


y  1  -  Hi  _  _  _  y_n  -  Hn 

-  g(m)  ’  ’  gidn)  .  ' 


(B-12) 


B.2.3  Hessian 


Using  =  v'(ni) ipCix^)  =  v'^OgOE /?)*,-, 


dp 


dp 


d  n  _  g'(w)v(Hi)  ~  g{rji)-v'{in )  t 

“i  —  .  xixi  ■> 


dP?"'  v(fn)2 

and  the  Hessian  (matrix  of  second  derivatives)  is 

d 2 


(B-13) 


W)^x  =  aW_1'2 


;=i  L 

n  r 


K<A)  1  • 


i=1  L 


>  rr,n  ^d6i  dOj’  .  .  J2 

*  rfyS  +  “  "f)  dwe‘ 


g(>v)2  ,  ,  *  g'idiMdi)  -  g(di)2v'(vi) 

+  ( y;  -  /if) - 


v(/if) 


v(yW,) 


■  U 


Xi*i 


—a(yfj)~l  • 


where 


WN  =  WF-  Wd 

and  Wd  is  a  diagonal  matrix  with  diagonal  elements 

t  ,  g'idi)v(Hi)  -  g(rji)2v'(lii) 

wDi  =  {yi  -  Hi) - — o - 

V  (///)- 

Since  E[Wd]  =  0,  the  expected  value  of  the  Hessian  is 

E 'H (/?)  =  -r/(iA)_1  •  XfWFX  . 

The  Fisher  Information  Matrix  is 


(B-14) 


(B-15) 


(B  - 1 6) 


(B-17) 


Mp  =  E 


E 

\d-£-d-£’} 

=  -E 

r  d2 

.df5  dp 

[dpp! 

=  -E -XrWFX  ,  (B-18) 


Approved  for  public  release;  distribution  is  unlimited. 


36 


and  the  asymptotic  estimator  distribution  is  N(f3,a(i//)  ■  (X'WpX)  *). 

B.2.4  Newton-Raphson 

Now,  apply  the  Newton-Raphson  algorithm  to  iteratively  solve  the  optimization. 
For  GLM,  the  Newton-Raphson  update  is  (3  =  f30  +  6  where 

=  -D(Po) 

( XrWNX)S  =  X’WfUf 

( X‘WNX)S  =  X’WNUN  (B-19) 

with 

UN  =  W^WpUp  .  (B-20) 

These  are  the  normal  equations  for  minimization  of  Q  =  \\Un  -  X6\\^,n-  Both  VFy 
and  Uff  depend  on  /30.  The  normal  equations  can  be  solved  iteratively  with  an  initial 
guess  f30  by  calculating  77  =  X/30,  n  =  G(j]),  g,  g',  v,  v',  WF,  UF,  WD,  WN,  and 
UN.  Then  solve  for  6.  The  updated  solution  is  (5  =  fi0  +  S.  Now  replace  [5„  with  /l, 
and  repeat.  This  is  iteratively  reweighted  least  squares  with  the  Newton-Raphson 
update. 

B.2.5  Fisher  Scoring 

For  the  GLM,  the  Fisher  scoring  update  uses  E  'H  in  place  of  7T  to  get 

E"H(/30)6  =  -D(fio) 

C XtWFX)8  =  X'WfUf  ,  (B -21) 

which  are  the  normal  equations  for  minimization  of  Q  =  \\U[r  -  X6\\^V/  .  Both  XV p 
and  Up  depend  on  f3„.  The  normal  equations  can  be  solved  iteratively  with  an  initial 
guess  f50  by  calculating  ij  =  Xfi0,  /u  =  G(t]),  g,  v,  Wf,  and  Uf ■  Then  solve  for  6, 
update,  and  repeat  as  above.  This  is  iteratively  reweighted  least  squares  with  Fisher 
scoring. 

B.3  Smoothing 

Smoothing  is  accomplished  by  maximizing  a  penalized  objective 

£s  =  £-l\\Sf3\\2  (B-22) 
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with  gradient 

£>s(P)  =  =  X'WpUp  -  s'sp , 

dp 

(B-23) 

Hessian  matrix 

d~  .  , 

<Hs(P)  =  y^i-Cs  =  -X'WNX  -  S'S, 
dpp ’ 

(B-24) 

and  expected  Hessian 

E <HS(P)  =  -X'WpX  -S'S. 

(B-25) 

The  Newton-Raphson  update  equations  are 

W&M  =  -Ds(Po) 

(X'WNX  +  S'S)(P  -  p0)  =  X'WFUF  -  S'Sp0 

(. X'WNX  +  S'S)P  =  ( X'WNX  +  S'S)p„  +  X’WpUp  -  s'sp„ 

=  X'WNXp0  +  X'WpUp 
=  X'WNXpa  +  XtWNWp1WFUp 
=  XtWN(Xpo  +  W^WFUF) 

=  X'WN(Xpo  +  £/*) 

+  S'S)^  =  X'WNZN  ,  (B-26) 

where 

ZN  =  Xj80  +  £/*  .  (B-27) 

These  are  the  normal  equations  for  minimization  of  Q  -  II Zn  -  xp\\^N  +  \\sp\\-. 

The  update  increment  6  is  now  implicit.  Replace  fi(t  with  ft  and  repeat. 

The  Fisher  update  equations  are  E<Hs(Po)P  =  E (Hs(Po)Po  ~  ®s(Po),  or 

E  <Hs(Po)S  = -£>s(Po) 

(. X'WpX  +  StS)(f3  -  po)  =  X'WpUp  -  S'Spo 

(. X'WpX  +  S'S)P  =  ( X'WpX  +  S'S)p0  +  X'WpUp  -  S'Spo 
=  X'WpXp0  +  X'WpUp 
=  X'Wp(XPo  +  Up) 

( X'WpX  +  S'S)P  =  X'WpZp  ,  (B-28) 
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where 


Z/7  =  X  (30  +  Up  . 


(B-29) 


These  are  the  normal  equations  for  minimization  of  Q  =  \\ZF  -  X/3\\^f  +  ||S/?||2. 

At  each  step,  the  normal  equations  are  solved  as  the  augmented  system  of  Sec¬ 
tion  A. 3.  P-spline  models  are  obtained  with  a  B-spline  basis  as  in  Section  A.4,  and 
smoothing  spline  models  are  obtained  with  X  =  I  as  in  Section  A. 5. 

Figure  B-l  depicts  the  effect  of  smoothing  parameter  variation  for  the  P-spline 
model,  and  Fig.  B-2  depicts  the  effect  of  smoothing  parameter  variation  for  the 
S-spline  model.  See  Fig.  B-3  for  graphs  of  OCV,  GCV,  and  AIC  as  functions  of  A, 
and  see  Fig.  B-4  for  GVC-optimal  P-spline  and  S-spline  solutions. 
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Fig.  B-l  P-spline  optimization,  GLM 
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Fig.  B-3  Smoothing  parameter  selection,  GLM 
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Fig.  B-4  GCV-optimal  fits,  GLM 
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B.4  Canonical  Link 


For  the  canonical  link,  wfi  =  g(?7,)  -  v(//,)  and  u n  =  ( v,  -  Hi)/wFi-  Also,  since 
ddjldp  =  Xj  and  d2dild(3f3t  =  0,  it  follows  that  E'Hip)  -  So  Newton- 

Raphson  and  Fisher  scoring  are  equivalent. 

B.5  Confidence  Intervals 


Normal-approximation  100c%  confidence  intervals  on  the  mean  response  are  given 
by 


(B-30) 


where  O  is  a  standard  normal  quantile,  V  is  the  estimated  parameter  variance  matrix, 
and  x  is  a  row  of  an  X  matrix  corresponding  to  the  desired  level.  For  the  basis 
implementation,  this  is 


(B-31) 


B.6  Bernoulli  Response 

For  example,  suppose  the  response  Y  e  {0, 1}  is  Bernoulli  with  Pr[Y  =  1]  =  //  = 

1  -  Pr[  K  =  0].  The  Bernoulli  distribution  is  a  member  of  the  exponential  family, 
Eq.  B-2,  with  the  particular  form 


/(y)  =  yu-v(  1  -  yu)* 1  -v  =  exp  y  log  -1 —  +  log(l  -  h) 


(B-32) 


So  a  =  1,  c  =  0,  and  there  is  no  nuisance  parameter.  Furthermore,  6  =  log  (ju/(l  -  p)) 
and  [u  =  1/(1  +  e~e),  and  so  b(0 )  =  -log(l  -  /u)  =  log(l  +  e6).  Note  that 
E[Y]  =  b\6 )  =  ed/(l  +  ee)  =  n  and  Var[y]  =  v(/u)  =  b"{6)  =  e~e/(l  +  e~0)2  = 
H(  1  -  /r)  as  expected. 

With  77  =  6  and  /j.  =  G(9),  we  see  that  the  canonical  link  for  Bernoulli  response  is 
the  logistic  cumulative  distribution  function  (CDF)  G{tj)  =  1/(1  +  e ~T1)  =  /i.  Note 
that  g(7 7)  =  yu(l  -  //)  =  v(y u),  so  wFi  =  Hi ( 1  -  Hi)  and  uFi  =  (yt  -  Hi) I  (^/(  1  -  di))- 
The  resulting  model  is  logistic  regression,  or  the  logit  model. 

For  an  arbitrary  link  CDF  G,  we  take  77  =  X/3,  h  =  G(Xfi),  vQ u)  =  yu(l  -  77),  and 
g(77)  =  g(X(3). 

Use  of  the  standard  normal  CDF  G  =  ©  with  probability  density  function  (PDF) 
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g  =  0  gives  the  probit  model. 


Because  the  likelihood  function  is  l  =  we  have  Lmi  =  1  for 

the  Bernoulli  model,  and  then  the  deviance  is  A  =  -2  log  L. 

As  an  example,  consider  the  usual  2-parameter  model  with  predictor  (jci,.  . .  ,xn) 
and  response  (yi, . . .  ,yn).  The  increment  5  =  (do,d\)  is  the  solution  of  M5  =  A, 
where 


Af  =  X'lTX  = 

2  Wj 

2  Wj  Xi 

and  A  =  X'WU  = 

2  Wj  Ui 

2  wi  Xi 

2  mxl 

2  Wj  Ui  Xi 

To  do  the  Fisher  update  of  Section  B  .2.5,  calculate  the  linear  response  77,  =  bo+biXi, 
mean  //,■  =  G(rji),  derivative  g,  =  variance  v,  =  jJi(  1  -  ///),  transformed  re¬ 

sponse  Uj  =  uFi  =  (y,-  -  ndlgi,  weight  Wj  =  wFj  =  g~/vi,  and  weighted  transformed 
response  wm  =  wFi  uFi  =  (yt  -  m)gi/vi. 

For  the  Newton-Raphson  update,  woi  =  (.V;  -  udig'vi  -  g2v')/v2  and  vv;v,  =  w?;  - 
wDi  and  Uj  =  um  =  wFj/wNi(yj  -  Hi)/gj.  Then  wt  =  u’iV,  and  wjul  =  wNiuNi  = 

Wf/  M/y. 


For  the  canonical  link  Wj  =  wy,-  =  gj  =  v,-  and  w,w/  =  wFj  uFj  -  y;-  -  /i,-. 
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Appendix  C.  B-splines 
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A  degree- d  B-spline  basis  (B u,. . .  ,Bn^)  of  dimension  n  uses  n  +  d  +  1  knots 

(t  1  5  •  •  •  5  bl  +  rf+l)  ' 

There  are  k  =  n  —  d  -  1  internal  knots  (internal  to  the  data)  and  2d  +  2  boundary 
knots,  d  +  1  on  each  side.  There  are  no  internal  knots  if  k  =  0  and  n  =  d  +  1,  and  in 
general  k  >  0  so  n  >  d.  The  left  knots  are  (t\,. . .  ,td+i )>  the  internal  knots,  when 
n  >  d  +  2,  are  (tj+2,  ■  ■  ■  ,tn),  and  the  right  knots  are  (tn+ i, . . .  ,tn+d+ 1). 

On  the  data  range  [xo,x  \  ]  internal  knots  are  evenly  spaced  so  t,  =  xo  + 1~^+\  (x  t  -*o) 
for  i  =  d  +  2, . . .  ,n.  This  also  accounts  for  td+  i  =  xo  and  tn+ 1  =  x\. 

Compact  knots  are  constructed  by  replicating  knots  at  the  boundary,  so  t\  =  •  •  •  = 
O+i  =  xq  and  tn+\  =  •  •  •  =  tn+d+\  =  xi. 


Uniform  knots  are  constructed  by  repeating  the  uniform  spacing  for  all  knots,  so 
ti  =  xo  +  (xi  -  xo)  for  i  =  1,. . .  ,n  +  d  +  1. 

This  is  the  Cox-de  Boor  recursion  for  B-spline  basis  functions: 


Bi,o(x)  = 


1 ,  ti  <  x  <  ti+l 
I  0 ,  otherwise 


/  =  1, . . .  ,n  +  d  ; 

X  t(  ^i+j+l  ^ 

Bjj(x)  =  - '-Bij- i(x)  +  — - —Bi+1J- i(x) 


li+j 


ti 


j  =  l,..., d 


ti+j+l  tj+ 1 
/  =  0 .  n  +  d  -  j  . 


(C-l) 


This  is  the  relation  for  derivatives  of  B-spline  basis  functions: 


ti+j  tj 


Bu-i(x)  - 


J 


ti+j+ 1  ti+ 1 


i+hj- 1 


(x)  . 


(C-2) 


Example  graphs  follow.  All  spline  bases  have  dimension  «  =  8.  Figures  C-l  and 
C-2  demonstrate  uniform  and  compact  knots,  respectively,  for  bases  of  degree  d  = 
0, 1,2,  and  3.  Figures  C-3  and  C-4  demonstrate  uniform  and  compact  knots,  respec¬ 
tively,  for  bases  of  degree  d  =  3  with  derivative  orders  p  =  0, 1,2,  and  3. 
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Fig.  C-l  B-spline  basis,  uniform 
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Fig.  C-2  B-spline  basis,  compact 
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Fig.  C-3  B-spline  basis  derivatives,  uniform 
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Fig.  C-4  B-spline  basis  derivatives,  compact 
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Appendix  D.  Matrix  Differentiation  Operators 
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Consider  that  y  =  f(x),  so  y  is  a  function  of  x,  and  n  values  of  Zi  =  ( Xi,yt )  for 
i  =  1,. . .  ,n  are  given,  where  each  x\  <  Xj+ j.  Let  J\,p  be  the  degree-/)  interpolating 
polynomial  through  the  p  +  1  consecutive  points  (zk,  ■  ■  •  ,Zk+P ),  so  that  fk,P(xi )  =  >7 
for  k  <  i  <  k  +  p.  Interpolating  polynomials  satisfy  the  recursion 


fk,p(x) 


(x  -  Xk)fk+\,p-l(x)  +  ( Xk+p  -  x)fk,p-l(x ) 
Xk+P  —  %k 


Define  the  Newton  basis  polynomials  nkj  by 


(D-l) 


nk,o(x)  =  1  ,  1  <  k  <  n 

k+j-l 

nk,j(x )  =  Y\  (x  ~  xi)  >  1<&<«,1< /'<«-&,  (D-2) 

i-k 

so  rikj(xi)  =  0  for  j  >  1  and  k^i^k+j  -  1.  Note  that  has  degree  j  and  the 
leading  coefficient  1,  so  nkj(x)  =  xJ  +  lower-degree  terms.  Write  the  polynomial  as 

p 

fk,P(-x )  =  Y^ak’jnk'j  (^)  (°-3) 

7=0 

so  that 


A,oU)  =  ak,onkp(x) ,  1  <  £  <  n 

/*,/(*)  =  akjnkj(x)  +  fkj- iW  ,  1  <  j  ,  (D-4) 

and  the  leading  coefficient  of  fkj  is  akj-  Then,  because  of  Eqs.  D-l  and  D-4,  the 
coefficients  obey 


a-k.Q  =  yk 

Qk+lj—l  ~  ®kj- 1 

&kj  — 

Xk+j  -  Xk 


(D-5) 
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The  coefficient  sequence  is 


(  0  ,  Clk,  1  Uk2  ,  Clk,3  ,  UkA  ,•••)  = 


yk+i-yk+2  yk+i-yk+i  y<r+2~y<.-+i  yt+i-yt 
yk+2~yk  +  \  _  yk+\~yk  xk+3~xk+2  xk+2~xk+ 1  _  xk+2~xk  +  l  xk+l~xk 
yk+ 1  3T  %k+2~  %k+l  Xk+\~*k  ^/r+3-  3f£+l  Xk+2~%k 

yk  •>  •>  ? 

-^£+1  —  %k+2  ~  %k  %k+ 3  — 


y<r+4~y<r+3  y<r+3~y<r+2  y<r+3~y<r+2  yfc+2~yfc+l  y<r+3~y<r+2  y<r+2~y<r+l  Vk+2-yk+l  yk+l-yk 


xk+ 4  xk+ 3 

xk+3~xk+2 

xk+3~xk+2  xk+2~xk+l 

xk+3~xk+2 

xk+2~xk+l 

xk+2~xk+l  xk+l~xk 

xk+4~ 

~xk+2 

xk+3~xk+l 

xk+  3‘ 

~xk+l 

xk+2~xk 

%k+ 4  %k+ 1 

%k+ 3  %k 

Xk+ 4  %k 


(D-6) 


For  evenly  spaced  xj  with  Xj+\  -  xj  =  h  and  Xk+j  -  *k  =  jh,  we  have 


so 


=  yk 


ak,j  ~ 


ak+l,j-l  -  ak,j- 1 

jh 


(D-7) 


(  Clkfi  ,  «/t,2  ,  ,  a^-,4  ,■■■)  = 

yk+ 1  -  y*  yfc+2  -  2yk+1  +  yk  yk+3  -  ^n+2  +  3yfc+i  -  yk 

h  ’  2h2  ’  6/i3 

y<r+4  -  4yk+ 3  +  6vfc+2  -  4y^+i  + 

24  h4 

For  evenly  spaced  x,  the  coefficients  are 


(D-8) 


akJ  ~ 


(D-9) 


In  general,  the  numerical  derivative  of  order  p  coincides  with  the  pth  derivative 
of  fk,p, 

f(kPp(x)  =  p\  apP-  (D-10) 

There  are  n  -  p  sequences  (z.k,-  ■  ■  ,Zk+P )  and  therefore  n  -  p  values  of  f(kp)p  for 
1  <  k  <  n  -  p. 

Now,  express  the  differentiation  operator  as  a  matrix. 
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Define  v(J),  the  first-order  difference  matrix  of  dimension  k  >  2,  size  (J  -  1)  x  j, 
by  y(J)a  =  -1  and  v(J)u+ 1  =  1  for  1  <  i  <  j  -  1. 


v(/') 


-1  1 
-1 


-1  1 


(D-ll) 


Then 


y(j) 


Ml 

U2  ~  Ml 

llj 

Uj  -  Uj- 1_ 

(D-12) 


and  y  serves  to  evaluate  the  numerator  of  Eq.  D-5. 

Define  8(u,j),  the  lag-j  (square)  diagonal  difference  matrix  of  dimension  n  —  j  for 
u  =  (u i, . . .  ,un )  and  1  <  j  <  n  -  1  by  8(u,j)jj  =  uj+j  -  Ui  for  1  <  i  <  n  -  j 


Uj+ 1  -  u  i 


5(w,y)  = 


(D-13) 


Mn  M-n—j 

So  8  serves  to  evaluate  the  denominator  of  Eq.  D-5. 

Then  matrix  derivative  operators  for  the  n  -vector  x  are  given  recursively  by 

D\  =  8(x,  l)-1  v(n) 

Dj  =  j  S(xjrl  v(n  -j  +  1)  Dj-u  (D-14) 


where  the  factor  of  j  serves  to  evaluate  the  factorial  in  Eq.  D-10. 

The  matrix  Dp  evaluates  the  pth  derivative. 

Based  on  points  (zk,  ■  ■  ■  ,Zu+P )  for  1  <  k  <  n -p,  the  individual  derivative  value  are 

f(kPl(x)  =  (Dp  y)k  .  (D-15) 
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The  derivative  vector  is  Dp  y. 


DPy  = 


/>> 

/&’(*> 


Ap) 

J  n-k,p 


O) 


and  its  norm  is  approximately 


dx*\\Dpy\\2  =  ytDtDpy. 


(D-16) 


(D-17) 


R  language  code  for  evaluating  Dp  is  easy. 

D  <-  diff (diag(n))  /  diff(x,  lag=l) 
if  (p>l)  for  (j  in  2:p) 

D  <-  j  *  (diff(diag(n  -  j  +  1))  /  diff(x,  lag=j))  %*%  D 
Here,  diff (diag(n))  is  v(n )  and  diff(x,  lag=j))  is  8(x,j). 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


AIC 

Akaike  information  criterion 

CB 

Chang-Bodt 

CDF 

cumulative  distribution  function 

GCV 

generalized  cross-validation 

GLM 

Generalized  Linear  Model 

IID 

independent  and  identically  distributed 

IRLS 

iteratively  reweighted  least  squares 

OCV 

ordinary  cross-validation 

PDF 

probability  density  function 

QR 

quantal  response 

SSE 

sum  of  squared  errors 
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